Knowledge-Based Planning for Robustly Optimized Intensity-Modulated Proton Therapy of Head and Neck Cancer Patients

Purpose To assess the performance of a proton-specific knowledge-based planning (KBP) model in the creation of robustly optimized intensity-modulated proton therapy (IMPT) plans for treatment of advanced head and neck (HN) cancer patients. Methods Seventy-three patients diagnosed with advanced HN cancer previously treated with volumetric modulated arc therapy (VMAT) were selected and replanned with robustly optimized IMPT. A proton-specific KBP model, RapidPlanPT (RPP), was generated using 53 patients (20 unilateral cases and 33 bilateral cases). The remaining 20 patients (10 unilateral and 10 bilateral cases) were used for model validation. The model was validated by comparing the target coverage and organ at risk (OAR) sparing in the RPP-generated IMPT plans with those in the expert plans. To account for the robustness of the plan, all uncertainty scenarios were included in the analysis. Results All the RPP plans generated were clinically acceptable. For unilateral cases, RPP plans had higher CTV_primary V100 (1.59% ± 1.24%) but higher homogeneity index (HI) (0.7 ± 0.73) than had the expert plans. In addition, the RPP plans had better ipsilateral cochlea Dmean (−5.76 ± 6.11 Gy), with marginal to no significant difference between RPP plans and expert plans for all other OAR dosimetric indices. For the bilateral cases, the V100 for all clinical target volumes (CTVs) was higher for the RPP plans than for the expert plans, especially the CTV_primary V100 (5.08% ± 3.02%), with no significant difference in the HI. With respect to OAR sparing, RPP plans had a lower spinal cord Dmax (−5.74 ± 5.72 Gy), lower cochlea Dmean (left, −6.05 ± 4.33 Gy; right, −4.84 ± 4.66 Gy), lower left and right parotid V20Gy (left, −6.45% ± 5.32%; right, −6.92% ± 3.45%), and a lower integral dose (−0.19 ± 0.19 Gy). However, RPP plans increased the Dmax in the body outside of CTV (body-CTV) (1.2 ± 1.43 Gy), indicating a slightly higher hotspot produced by the RPP plans. Conclusion IMPT plans generated by a broad-scope RPP model have a quality that is, at minimum, comparable with, and at times superior to, that of the expert plans. The RPP plans demonstrated a greater robustness for CTV coverage and better sparing for several OARs.


INTRODUCTION
Head and neck (HN) cancer therapy is both challenging and complicated due to the proximity of clinical target volumes (CTVs) to various critical organs such as the oral cavity, pharynx, larynx, parotids, spinal cord, and brainstem. Radiation therapy for HN cancer is an often used treatment paradigm as an adjuvant to surgery or chemotherapy. Intensitymodulated radiation therapy (IMRT), volumetric modulated arc therapy (VMAT), and intensity-modulated proton therapy (IMPT), all of which can deliver a highly conformal dose to the tumor while sparing organs at risk (OARs), are advanced radiation therapy techniques commonly used for treatment of HN cancer. Both VMAT and IMRT utilize photons to irradiate the patients, while the IMPT utilizes protons. The physical property of proton beams that can eliminate "exit dose" beyond the Bragg peak allows for steeper dose gradients and better OAR sparing than the photon-based therapy. It is well documented that IMPT offers a superior dose distribution as well as reduced toxicity as compared with IMRT and VMAT in the treatment of HN cancers (1,2). Like IMRT, IMPT utilizes inverse planning optimization to achieve dosimetric objectives. However, the complexity of IMPT planning makes the quality of the IMPT plans very dependent on planner experience and skill, especially for plans in complex anatomy such as the HN region. This may lead to larger variations in plan quality and suboptimal dose distributions (3)(4)(5).
Knowledge-based planning (KBP) tools, which incorporate prior treatment planning experience, have the potential to improve the quality and consistency of treatment plans (6)(7)(8)(9)(10). One of the commercially available KBP systems [RapidPlan ™ (RP) Varian Medical Systems, Palo Alto, CA] employs a dosevolume histogram (DVH) estimation model trained from a library of high-quality treatment plans. It was demonstrated by numerous studies that RP is able to generate IMRT and VMAT plans comparable with or better than the expert plans for a range of treatment sites (11)(12)(13)(14)(15)(16). Recently, a proton-specific KBP system [RapidPlanPT ™ (RPP), Varian Medical Systems, Palo Alto, CA] was developed to accommodate the physical traits of protons (e.g., no dose beyond the Bragg peak) into the DVH estimation model (17). A small number of publications have explored the usefulness of the RPP for HN cancer. Delaney et al. originally described the principle of RPP and demonstrated the feasibility of generating clinically acceptable planning target volume (PTV)-based IMPT plans by RPP for HN patients (17,18). In their studies, a relatively narrow scope model was trained and evaluated, where IMPT plans with the same dose prescription and standardize field setup were applied. We believe that more studies are necessary to validate the RPP model reliability before it can be put into clinic use at this early stage. In the present work, we built an RPP model with a wide variety of HN proton plans (e.g., customized field setup, different prescriptions, and both unilateral and bilateral cases). This is a more "broad-scope" model than previously done, and we assessed its performance in the creation of robustly optimized IMPT plans for the HN cancer patients with different dose prescriptions and tumor localization.

Patient Cohort and Intensity-Modulated Proton Therapy Planning
Seventy-three patients with advanced HN cancer located in the mid/lower HN region, including base of the tongue, tonsil, oropharynx, hypopharynx, parotid, and larynx, were included in this study. These patients were previously treated with VMAT using simultaneous integrated boost (SIB) technique and were enrolled in a retrospective institutional review board (IRB) approved protocol. Thirty of the patients underwent unilateral HN treatment, and the remaining were treated with bilateral HN irradiation. For all patients, contrast and non-contrast planning CTs were acquired in a supine position with 1.5-mm slice thickness using the Siemens Somatom 16 slice CT simulator. All gross tumor volumes (GTVs), CTVs, and OARs, including the spinal cord, brainstem, parotids, constrictors, mandible, cochlea, larynx, carotids, and oral cavity, were delineated on the contrast CT, and these volumes were subsequently transferred to the non-contrast CT. For bilateral treatment, patients were treated with three dose levels: the primary CTV prescribed to 70 Gy; the secondary CTV prescribed to 66, 63, or 60 Gy; and the tertiary CTV prescribed to 56 Gy. For unilateral cases, either one or two dose levels were prescribed with some combination of doses at the levels of 66, 60, 55, 54, and 50 Gy.
For each patient, IMPT plans were generated using multifield optimization (MFO) technique. The IMPT plans employed two to four fields depending on the target extent and anatomy. The field number and arrangement were selected by the expert planners based mainly on the tumor anatomy and location. For each field, a field-specific target was created encompassing all CTVs. These field-specific targets were then modified to avoid having beams entering through the chin area or going through teeth. Streaking artifacts caused by dental implants were delineated and overridden to an appropriate density value. The non-linear universal proton optimizer (NUPO 15.6, Eclipse, Varian Medical Systems) was utilized for optimization along with the proton convolution superposition algorithm (PCS 15.6, Eclipse, Varian Medical Systems) for dose calculation. A relative biological effectiveness (RBE) of 1.1 was used to weight the dose. The spot spacing was set to 0.425 times the energy-dependent inair full width at half maximum (FWHM) spot size at the isocenter. All IMPT plans were robustly optimized using ±3 mm setup uncertainty (in cardinal directions) along with ±3% proton range uncertainty, resulting in 12 uncertainty scenarios. The targets were the only structures selected to be robustly optimized. The worst-case scenario was required to achieve V95 > 95% (95% of the volume receiving more than 95% of the prescription dose) for the CTVs while keeping the normal tissue constraints as low as possible. The dose constraints used for the OARs are shown in Table 1. All plans were normalized such that 95% of the primary CTV volume was covered by the 100% of the prescription dose (V100 = 95%). All proton plans were created by an experienced proton dosimetrist and reviewed by a medical physicist.

Knowledge-Based Planning Model Configuration
The proton-specific KBP optimization tool RPP (Eclipse TPS, ver. 16.1, Varian Medical Systems) was used to create the KBP library. RPP consists of two phases for model configuration: the data extraction phase and the model training phase. In the data extraction phase, the geometric and dosimetric features of selected structures are parameterized for use in model training. During the model training phase, the DVH estimation algorithm is applied to create a DVH estimation model. Individual structure objectives and priorities may be set or generated based on the training set and their principal components. As described in Delaney et al., RPP incorporates a simplified spreadout Bragg peak into the model and utilizes the geometry-based expected dose (GED) metric to estimate the distance of the different voxels in each structure from the target surfaces. Delaney et al. have described RPP modelling in greater detail as well as the differences between the photon-based model and the proton-based model in their work (17), so these details will not be included here.
In our study, 53 IMPT plans consisting of 20 unilateral cases and 33 bilateral HN cases were included in the proton RPP model library. A defined objective list was implemented in the model after initial model training as shown in Table 2. The model quality was assessed using model generated plots such as DVH plots, regression and residual plots based on principal component analysis (PCA), and some additional metrics (19). Coefficient of determination (R 2 ) and average chi-square (c 2 ) were applied to measure the goodness of fit of the model for each trained OAR, where the R 2 indicates the correlation between dosimetric and geometric features, while c 2 represents the difference between the original and estimated data (19).

Model Validation
The 20 (10 unilateral cases and 10 bilateral cases) patients who were not included in the model training served as the model-validation group. For each patient used in model validation, RPP plans were created using the same beam arrangement as the corresponding expert plans. Optimization was first performed using an autogenerated objective list by the RPP. One to two additional optimization iterations were performed to improve the CTV coverage or OAR sparing with small changes to the original objective list for some patients if aforementioned dose constraints were not met. The RPP plans were normalized applying the same normalization as the expert plan (V100 = 95%).
The RPP plans were assessed and compared with the expert plans using the same clinical dose-volume constraints for CTVs and OARs. Additionally, we assessed the integral dose deposited in the structure, which removed the CTV volume from the external volume contour (body-CTV). The homogeneity index (HI) was also evaluated for RPP-based IMPT plans and compared with that of the expert plans. In this work, the HI was defined as (20, 21) where D 2% is the dose to 2% of the CTV, D 98% is the dose to 98% of the CTV, and D p is the prescription dose for the CTV. The closer the HI value is to zero, the more homogenous the plan is.
In order to take the plan robustness into consideration, averaged dosimetric indices over all scenarios (12 uncertainty scenarios plus the nominal scenario) were calculated for each patient, and comparisons was carried out between expert and RPP plans. All comparisons were performed by two-sided Wilcoxon signedrank test. A p-value <0.05 was considered statistically significant. Table 3 reports the training results for the model. The R 2 was low for some structures such as the brainstem, larynx, and spinal cord, but the proximity of c 2 values (mean ± SD, 1.08 ± 0.02) to 1 indicates that the model is of good quality. Figure 1 shows the residual plots for some structures. The residual plots show how the original DVH of a structure differs from the estimated DVH, and they were used as a more realistic evaluation of potential influential points that can significantly affect the outcome of the DVH estimation model. Though previous studies have shown that removal of outliers from a good-quality KBP model library with sufficient population often does not have a significant impact on plan quality, outlier cases such as the one marked by the arrow in the constrictor plot were evaluated to determine if the patient needed to be re-planned (12,22). After review, we believe that they did not need to be excluded, as most outliers were due to an anatomical difference or a difference in the relative location of the object to the CTV; e.g., a large part of constrictor overlapped with the CTV for the arrowed case. Thus, we decided not to remove any of the outliers from the model.

Model Validation Results
Most IMPT plans generated by the expert planners and RPP met the clinical constraints in Table 1. Some constraints were not met V95 represents the relative volume receiving equal or more than the 95% of prescription dose; Dmax represents the maximum dose or relative dose delivered to the structure; V20Gy and Dmean represent the relative volume of the structure receiving more than 20 Gy and mean dose to the volume, respectively. CTV, clinical treatment volume; OAR, organ at risk; IMPT, intensity-modulated proton therapy. for very few cases due to close proximity of some OARs to CTVs. After review on these cases, these plans were clinically acceptable. Table 4 summarizes the comparison of dosimetric indices presented as mean ± SD between the RPP-generated plans and the expert plans for 10 unilateral (Table 4A) and 10 bilateral (Table 4B) cases in the validation group. The range of each dosimetric index was also presented in brackets as (min, max) in Table 4. The dosimetric indices from nominal plans as well as the averaged dosimetric indices over all scenarios are listed in Table 4. To take the plan robustness into consideration, we will only focus on the results of averaged dosimetric indices over all scenarios. Figures 2A, B show the difference of averaged dose-volume indices over all scenarios between the RPP and expert plans for unilateral cases, and Figures 2C, D show the differences for bilateral cases.
In unilateral cases, RPP plans achieved more robust CTV coverage with a moderately higher CTV_primary V100 (1.59% ± 1.24%), whereas the expert plans were more homogeneous with a slightly lower CTV_primary HI (0.7 ± 0.73) than the RPP plans. The Dmax of the mandible from RPP plans was marginally higher than that of the expert plans (0.62 ± 0.25 Gy), but the RPP plans had a better ipsilateral cochlea Dmean (−5.76 ± 6.11 Gy). For other OAR dose-volume indices, there was no statistically significant difference between RPP and expert plans. In the bilateral cases, the V100 for all CTVs prescribed with different dose levels was higher for the RPP plans, especially for CTV_primary V100 (5.08% ± 3.02%), indicating that RPP plans were more robust for CTV coverage. There was no statistically significant difference for the CTV_primary HI between the expert and RPP plans.  Concerning the DVH of bilateral cases, the RPP plans did a better job of sparing the brainstem, spinal cord, cochlea, and parotids. Figure 4 shows the dose distributions of the RPP and expert plans for an example bilateral case from the validation group. Figures

DISCUSSION
This work demonstrated that a proton-specific KBP model, RPP, can generate high-quality IMPT plans for the HN cancer patients. One of the benefits of employing RPP is its high efficiency. On average, it required about 20 min to generate the prediction, optimizations, and dose calculation when utilizing the RPP model. In comparison, it typically took more than 2 h to complete HN IMPT plans by experienced dosimetrists in our study. Moreover, our results indicate that the plans generated by the RPP have greater robustness with respect to the CTV coverage when certain uncertainty parameters (3% range uncertainty and 3mm setup uncertainty) were applied, which is consistent with the results from our previous study (23). For the unilateral cases, the RPP plans achieved comparable OAR sparing with the expert plans except for the ipsilateral cochlea where the RPP plans delivered lower dose. Regarding bilateral cases, the RPP plans improved the sparing in brainstem, spinal cord, cochlea, parotids, and constrictors without reducing the plan homogeneity when compared with the expert plans. In addition, the reduction of integral dose, which is one of the main advantages of proton therapy, was observed in the RPP plans compared with the expert plans for bilateral cases. As a tradeoff, the bilateral RPP plans produced a slightly higher hotspot than the expert plans in the body outside of CTVs (1.69 ± 1.39 Gy). In general, our results are consistent with the previous studies illustrating that the RPP plans were at least equivalent to if not better than the expert plans (18,24,25). Different from the earlier studies by Delaney et al., which employed cases with the same prescription and standardized beam arrangement for model training and validation (17,18), our model was more broad-scope in that it included cases prescribed with varying dose levels and using different customized beam arrangements. The results suggest that this broad-scope model can create IMPT plans of good quality in the HN regardless of the beam arrangement and prescription. It has been demonstrated that the quality of VMAT plans for HN cancer created by the RP model was independent of prescription and beam geometry (26). A study comparing a proton model trained with customized beam number and arrangement to another model trained with standardized beam number and arrangement for hepatocellular carcinoma treatment indicated that two models performed equivalently with no statistically significant difference for almost all dose-volume parameters (24). However, as the quality of the proton plans is more dependent on the beam arrangement than the photon plans, the impact of employing IMPT plans with different beam arrangements versus standardized beam arrangement in the model should be investigated for HN IMPT model. Future work on the integration of an automated beam angle selection algorithm, which is under investigation (27), should be done to see if there can be further improvement in plan quality and efficiency. Concerning the treatment area, this study combined both unilateral and bilateral cases in the model training. In some photon KBP studies, models trained by combining unilateral and bilateral plans showed high quality in treatment of HN cancer (28). Another investigation revealed that a photon model trained by unilateral cases was able to generate high-quality VMAT plans for bilateral breast treatments (29).
It is yet not clear whether a combined model or a specific model is better in generation of HN IMPT plans. In photon therapy, one study showed that specific model resulted in improved quality for liver cancer (30), while another study revealed that there was no difference of the quality between a specific model and a combined model for prostate cancer (31). Therefore, it is worthwhile to explore whether there is any benefit of utilizing specific models by separating unilateral and bilateral cases for IMPT plan generation in HN cancer treatment.
This study included 53 cases for model training, and no outlier was removed from the model. Potential outliers identified by the RPP system indicate that the plan has a statistically significant difference as compared with the whole population in the model. However, earlier studies by Delaney et al. and Hussein et al. compared the quality of the plans generated by an outlier-free model to a model without outlier removal and demonstrated that the impact of a small number of outliers does not significantly impact the plan quality (12,22). Our previous investigation by Bossart et al. (31) also showed that the differences between refined KBP model generated by eliminating the dosimetric outliers and the original KBP generated plans were insignificant. According to the results, we believe that 53 patients should be enough to generate a reliable model, but it would be necessary to investigate the influence of the model size on the IMPT plan quality. One limitation of this study is that the 20 patients included in the validation set consisted of 10 unilateral cases and 10 bilateral cases, which may not be sufficient to confirm the reliability of the model, as it is at early stage for RPP exploration. That being said, many publications on KBP photon and proton models have included small numbers of plans for validation, and the vendor's recommendation is 10 validation cases to prove the model is working sufficiently (12,16,17,24,31,32).

CONCLUSION
This work explored the performance of a broad-scope protonspecific KBP model to generate robustly optimized IMPT plans for HN cancer patients. The results demonstrated that the IMPT plans created by the model have high quality that is at least comparable and even, in some ways, superior to that of the expert plans. The IMPT plans generated by the model had greater robustness for CTV coverage and better sparing for several OARs. More studies should be done to evaluate the RPP model reliability.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by University of Miami Institutional Review Board. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
ND designed the study, oversaw the whole study, and trained and validated the model library. YX carried out all data analysis and wrote the manuscript. JC, MO, and MB created IMPT expert plans for model training and validation. EB provided expertise on the proton model training and validation. KP provided his expertise in data analysis. TD provided his expertise in the study design and evaluated the IMPT plans. SS and MS provided contouring of the cases included in this study and reviewed the assessment of the cases. All authors contributed to the article and approved the submitted version.

FUNDING
This work was supported in part by a research grant from Varian Medical Systems, Palo Alto, CA (GR013242).